Welcome to the Tuesday session.
Yesterday we were considering at the beginning a topic that most of you are familiar with.
We talked about the Bayesian classifier, the Bayesian decision rule.
By maximizing the a posteriori probability we are able to build a classifier that is
optimal with respect to the average loss that you get if you have a 0, 1 loss function.
We have shown also a little proof, it's very small and intuitive proof for this type of
fact.
And then in the second chapter we started to look at logistic regression.
What we did is we applied a very simple trick.
We just rewrote the posteriori using the Bayesian decomposition of the posteriori probability
with a known ratio.
And then we did a little trick.
We divided by the denominator, by the denominator and then we got this expression and then we
applied the exponential function and the logarithm.
So we mapped things back and forth and we ended up finally with a representation that
is the sigmoid function.
So the a posteriori probability can be rewritten by a very basic arithmetic operation as a
function that looks like that.
That's called the sigmoid function or the logistic function.
That's 1 over 1 plus e to the power of minus f of x.
And then we have shown yesterday that the function f of x is equal to 0 defines our
decision boundary.
So if you have a classification problem where you have two classes
and you have a decision boundary like that, it's exactly f of x is equal to 0.
That's the implicit representation of this.
And if you have to write down the posteriori probability of the two classes, you know the
posteriori is 1 over 1 plus e to the power plus minus f of x, depending on which class
you are considering.
And the sigmoid function, for those of you who attended pattern recognition in winter,
that's something that is used in neural networks.
The sigmoid function is used in neural networks, heavily used in neural networks, and we will
see later that logistic regression and the standard perceptron are doing pretty much
the same.
So then we looked at the derivative of the sigmoid function, and it has the nice property
that the derivative is just the function times 1 minus the function.
Also very nice property, and we will reuse it later on when we will compute derivatives
to estimate the parameters of the function f of x.
This is how these sigmoid functions look like.
It's more or less a step function, approximating a step function.
Dependent on the choice of the prefactor, you can get closer and closer to just a sharp
step function.
Now we looked at the decision boundary, and then we started out to look at an example.
And this is a very, very important example that we have considered yesterday.
Let's assume the class conditional probability, so p of x given the class, is represented
by a Gaussian.
The formula for Gaussian, you should be able to write it down by heart.
It's not a good strategy to show up in the oral exam without being able to write down
the Gaussian.
So be aware of that.
And we have seen yesterday that once we assume that the class conditional probability is
Presenters
Zugänglich über
Offener Zugang
Dauer
00:43:20 Min
Aufnahmedatum
2009-04-28
Hochgeladen am
2017-07-05 12:35:04
Sprache
en-US